[linux-nvidia-6.17] Use architecture specific HBM training status register#331
Open
ankita-nv wants to merge 1 commit intoNVIDIA:24.04_linux-nvidia-6.17-nextfrom
Conversation
nvmochs
approved these changes
Feb 25, 2026
Collaborator
nvmochs
left a comment
There was a problem hiding this comment.
This looks good to me.
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Collaborator
|
@ankita-nv Are there plans to upstream this patch? |
Collaborator
|
|
Author
Yeah, I'll post it shortly after internal review. |
Collaborator
|
Ankit requested that we hold on getting this integrated. |
58fc644 to
4b04466
Compare
…diness check Blackwell-Next GPUs report device readiness via the CXL DVSEC Range 1 Low register (offset 0x1C) instead of the BAR0 HBM training register used by GB200. The GPU memory readiness is checked by polling for the Memory_Active bit (bit 1) for the Memory_Active_Timeout (bits 15:13). Add runtime detection by checking the presence of the DVSEC register. Route to the new method if present, otherwise continue using the legacy approach. Signed-off-by: Ankit Agrawal <ankita@nvidia.com>
4b04466 to
f400624
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Blackwell-Next GPUs use a different BAR0 offset (0xAD00BC) for the HBM
training status register than GB200 (0x200BC). Add runtime detection by
reading the architecture field from PMC BOOT_42 and selecting the
appropriate offset when polling for device readiness.
Signed-off-by: Ankit Agrawal ankita@nvidia.com